Proximal Policy Optimization (PPO) is a highly popular policy-based deep reinforcement learning (DRL) approach. However, we observe that the homogeneous exploration process in PPO could cause an unexpected stability issue in the training phase. To address this issue, we propose PPO-UE, a PPO variant equipped with self-adaptive uncertainty-aware explorations (UEs) based on a ratio uncertainty level. The proposed PPO-UE is designed to improve convergence speed and performance with an optimized ratio uncertainty level. Through extensive sensitivity analysis by varying the ratio uncertainty level, our proposed PPO-UE considerably outperforms the baseline PPO in Roboschool continuous control tasks.
translated by 谷歌翻译
对国际气候变化小组(IPCC)的第六次评估指出,“过去十年(2010-2019)的累积净二氧化碳排放量与剩下的11个碳预算可能会限制为1.5C(中等信心)大约相同)。”这样的报告直接培养了公众的话语,但是诸如信念和信心程度之类的细微差别常常失去。在本文中,我们提出了一个正式的帐户,以允许在抽象论证设置中使用这种信念和相关的信心来标记论证。与概率论证中的其他建议不同,我们关注对Sato分布语义的选择构建的概率推断的任务,Sato的分布语义已被证明涵盖了包括贝叶斯网络的语义在内的各种情况。从有关此类语义的大量文献中借用,我们研究了如何在考虑不确定概率的情况下在实践中处理此类任务,并与现有的概率论点的现有建议讨论联系。
translated by 谷歌翻译
在二阶不确定的贝叶斯网络中,条件概率仅在分布中已知,即概率上的概率。Delta方法已应用于扩展精确的一阶推理方法,以通过从贝叶斯网络得出的总和产物网络传播均值和方差,从而表征了认知不确定性或模型本身的不确定性。另外,已经证明了Polytrees的二阶信仰传播,但没有针对一般的定向无环形结构。在这项工作中,我们将循环信念传播扩展到二阶贝叶斯网络的设置,从而产生二阶循环信念传播(SOLBP)。对于二阶贝叶斯网络,SOLBP生成了与Sum-Propoduct网络生成的网络一致的推论,同时更加有效且可扩展。
translated by 谷歌翻译
当历史数据受到限制时,与贝叶斯网络节点相关的条件概率不确定,并且可以在经验上进行估计。二阶估计方法为估计概率和量化这些估计的不确定性提供了一个框架。我们将这些案例称为Uncer Tain或二阶贝叶斯网络。当完成此类数据时,即每个实例化都观察到所有可变值,已知有条件的概率是dirichlet分布的。本文通过使他们能够学习参数(即条件概率),通过不完整的数据来学习不确定的贝叶斯网络的当前最新方法。我们广泛评估各种方法,通过各种查询的置信界的所需和经验得出的强度来学习参数的后验。
translated by 谷歌翻译
对不确定性的深入了解是在不确定性下做出有效决策的第一步。深度/机器学习(ML/DL)已被大大利用,以解决处理高维数据所涉及的复杂问题。但是,在ML/DL中,推理和量化不同类型的不确定性的探索少于其他人工智能(AI)领域。特别是,自1960年代以来,在KRR上已经研究了信仰/证据理论,以推理并衡量不确定性以提高决策效率。我们发现,只有少数研究利用了ML/DL中的信念/证据理论中的成熟不确定性研究来解决不同类型的不确定性下的复杂问题。在本调查论文中,我们讨论了一些流行的信念理论及其核心思想,这些理论涉及不确定性原因和类型,并量化它们,并讨论其在ML/DL中的适用性。此外,我们讨论了三种主要方法,这些方法在深度神经网络(DNN)中利用信仰理论,包括证据DNN,模糊DNN和粗糙的DNN,就其不确定性原因,类型和量化方法以及其在多元化问题中的适用性而言。域。根据我们的深入调查,我们讨论了见解,经验教训,对当前最新桥接信念理论和ML/DL的局限性,最后是未来的研究方向。
translated by 谷歌翻译
可解释的深度学习模型的最新努力表明,基于概念的解释方法通过标准的端到端模型实现了竞争精度,并能够从图像中提取高级视觉概念的推理和干预,例如识别机翼颜色和喙长度用于鸟类分类。但是,这些概念瓶颈模型依赖于一组必要且充分的预定义概念,这对于诸如视频分类等复杂任务很棘手。对于复杂的任务,标签和视觉元素之间的关系涵盖了许多框架,例如,识别出具有各种抽象水平的鸟类飞行或捕获猎物不必要的概念。为此,我们提出了Codex,这是一个自动概念发现和提取模块,严格地构成了基于概念的视频分类的必要且充分的概念摘要集。 Codex从自然语言解释视频解释中确定了一系列复杂的概念摘要,从而需要预先定义一组无定形的概念集。为了证明我们的方法的生存能力,我们构建了两个新的公共数据集,这些数据集将现有的复杂视频分类数据集与其标签的简短,众包的自然语言解释相结合。我们的方法在自然语言中引发了固有的复杂概念摘要,以将概念 - 底层方法推广到复杂的任务。
translated by 谷歌翻译
语言模型既展示了定量的改进,又展示了新的定性功能,随着规模的增加。尽管它们具有潜在的变革性影响,但这些新能力的特征却很差。为了为未来的研究提供信息,为破坏性的新模型能力做准备,并改善社会有害的效果,至关重要的是,我们必须了解目前和近乎未来的能力和语言模型的局限性。为了应对这一挑战,我们介绍了超越模仿游戏基准(Big Bench)。 Big Bench目前由204个任务组成,由132家机构的442位作者贡献。任务主题是多样的,从语言学,儿童发展,数学,常识性推理,生物学,物理学,社会偏见,软件开发等等。 Big-Bench专注于被认为超出当前语言模型的功能的任务。我们评估了OpenAI的GPT型号,Google内部密集变压器体系结构和大型基础上的开关稀疏变压器的行为,跨越了数百万到数十亿个参数。此外,一个人类专家评估者团队执行了所有任务,以提供强大的基准。研究结果包括:模型性能和校准都随规模改善,但绝对的术语(以及与评估者的性能相比);在模型类中的性能非常相似,尽管带有稀疏性。逐渐和预测的任务通常涉及大量知识或记忆成分,而在临界规模上表现出“突破性”行为的任务通常涉及多个步骤或组成部分或脆性指标;社交偏见通常会随着含糊不清的环境而随着规模而增加,但这可以通过提示来改善。
translated by 谷歌翻译
The recent increase in public and academic interest in preserving biodiversity has led to the growth of the field of conservation technology. This field involves designing and constructing tools that utilize technology to aid in the conservation of wildlife. In this article, we will use case studies to demonstrate the importance of designing conservation tools with human-wildlife interaction in mind and provide a framework for creating successful tools. These case studies include a range of complexities, from simple cat collars to machine learning and game theory methodologies. Our goal is to introduce and inform current and future researchers in the field of conservation technology and provide references for educating the next generation of conservation technologists. Conservation technology not only has the potential to benefit biodiversity but also has broader impacts on fields such as sustainability and environmental protection. By using innovative technologies to address conservation challenges, we can find more effective and efficient solutions to protect and preserve our planet's resources.
translated by 谷歌翻译
We present the interpretable meta neural ordinary differential equation (iMODE) method to rapidly learn generalizable (i.e., not parameter-specific) dynamics from trajectories of multiple dynamical systems that vary in their physical parameters. The iMODE method learns meta-knowledge, the functional variations of the force field of dynamical system instances without knowing the physical parameters, by adopting a bi-level optimization framework: an outer level capturing the common force field form among studied dynamical system instances and an inner level adapting to individual system instances. A priori physical knowledge can be conveniently embedded in the neural network architecture as inductive bias, such as conservative force field and Euclidean symmetry. With the learned meta-knowledge, iMODE can model an unseen system within seconds, and inversely reveal knowledge on the physical parameters of a system, or as a Neural Gauge to "measure" the physical parameters of an unseen system with observed trajectories. We test the validity of the iMODE method on bistable, double pendulum, Van der Pol, Slinky, and reaction-diffusion systems.
translated by 谷歌翻译
While the brain connectivity network can inform the understanding and diagnosis of developmental dyslexia, its cause-effect relationships have not yet enough been examined. Employing electroencephalography signals and band-limited white noise stimulus at 4.8 Hz (prosodic-syllabic frequency), we measure the phase Granger causalities among channels to identify differences between dyslexic learners and controls, thereby proposing a method to calculate directional connectivity. As causal relationships run in both directions, we explore three scenarios, namely channels' activity as sources, as sinks, and in total. Our proposed method can be used for both classification and exploratory analysis. In all scenarios, we find confirmation of the established right-lateralized Theta sampling network anomaly, in line with the temporal sampling framework's assumption of oscillatory differences in the Theta and Gamma bands. Further, we show that this anomaly primarily occurs in the causal relationships of channels acting as sinks, where it is significantly more pronounced than when only total activity is observed. In the sink scenario, our classifier obtains 0.84 and 0.88 accuracy and 0.87 and 0.93 AUC for the Theta and Gamma bands, respectively.
translated by 谷歌翻译